y-Randomization and Its Variants in QSPR/QSAR
نویسندگان
چکیده
y-Randomization is a tool used in validation of QSPR/QSAR models, whereby the performance of the original model in data description (r2) is compared to that of models built for permuted (randomly shuffled) response, based on the original descriptor pool and the original model building procedure. We compared y-randomization and several variants thereof, using original response, permuted response, or random number pseudoresponse and original descriptors or random number pseudodescriptors, in the typical setting of multilinear regression (MLR) with descriptor selection. For each combination of number of observations (compounds), number of descriptors in the final model, and number of descriptors in the pool to select from, computer experiments using the same descriptor selection method result in two different mean highest random r2 values. A lower one is produced by y-randomization or a variant likewise based on the original descriptors, while a higher one is obtained from variants that use random number pseudodescriptors. The difference is due to the intercorrelation of real descriptors in the pool. We propose to compare an original model's r2 to both of these whenever possible. The meaning of the three possible outcomes of such a double test is discussed. Often y-randomization is not available to a potential user of a model, due to the values of all descriptors in the pool for all compounds not being published. In such cases random number experiments as proposed here are still possible. The test was applied to several recently published MLR QSAR equations, and cases of failure were identified. Some progress also is reported toward the aim of obtaining the mean highest r2 of random pseudomodels by calculation rather than by tedious multiple simulations on random number variables.
منابع مشابه
Y - Randomization – A Useful Tool in QSAR Validation , or Folklore ?
Several variants of randomization procedures were compared as a tool in validation of multilinear regression (MLR) QSAR equations that are obtained by descriptor selection. Y-randomization, a method formerly said to be probably the most powerful validation procedure, was found to be overoptimistic. The statistical significance of a new MLR QSAR model should be checked by comparing its measure o...
متن کاملA novel topological descriptor based on the expanded wiener index: Applications to QSPR/QSAR studies
In this paper, a novel topological index, named M-index, is introduced based on expanded form of the Wiener matrix. For constructing this index the atomic characteristics and the interaction of the vertices in a molecule are taken into account. The usefulness of the M-index is demonstrated by several QSPR/QSAR models for different physico-chemical properties and biological activities of a large...
متن کاملQSPR Analysis with Curvilinear Regression Modeling and Topological Indices
Topological indices are the real number of a molecular structure obtained via molecular graph G. Topological indices are used for QSPR, QSAR and structural design in chemistry, nanotechnology, and pharmacology. Moreover, physicochemical properties such as the boiling point, the enthalpy of vaporization, and stability can be estimated by QSAR/QSPR models. In this study, the QSPR (Quantitative St...
متن کامل7 . Orthogonalization methods in QSPR - QSAR Studies
We discuss some features of the orthogonalization methods commonly applied to QSPR QSAR studies. We outline the well known multivariable linear regression analysis in vector form in order to compare mainly Randic and Gram-Schmidt orthogonalization procedures and also cast the basis for other approaches like Löwdin’s one. We expect that present review may become the starting point for future dev...
متن کاملQSPR designer – employ your own descriptors in the automated QSAR modeling process
The prediction of physical and chemical properties of molecules is a very important step in the drug discovery process. QSAR and QSPR models are strong tools for predicting these properties. The models employ descriptors and statistical approaches to provide an estimation of the desired property. An abundance of descriptors and QSAR/QSPR models were published, but the prediction of some propert...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of chemical information and modeling
دوره 47 6 شماره
صفحات -
تاریخ انتشار 2007